Gestural, motion and speech interface control method for 3D audio-video data navigation on handheld devices

ABSTRACT

A cognizant and adaptive method of informing a multi-modal navigation interface or a user&#39;s intent. This provides the user with the experience of exploring an immersive representation of the processed multimedia (audio-video-data) sources available that automatically adapts to her/his fruition preference. These results are obtained by first reconciling and aligning the User and the Device&#39;s frames of reference in tri-dimensional space and then dynamically and adaptively Smoothly Switching and/or combining both Gesture, Motion and Speech modalities. The direct consequence is a user experience that naturally adapts to the user choice of interaction and movement.

This application is a continuation of application Ser. No. 14/254,055filed Apr. 16, 2014, now U.S. Pat. No. 9,395,764 issued Jul. 19, 2016which was related to and claimed priority from U.S. Provisional PatentApplication No. 61/815,753 filed Apr. 25, 2013. Application Ser. No.14/254,055 and 61/815,753 are hereby incorporated by reference in theirentirety.

BACKGROUND

Field of the Invention

The present invention relates generally to navigation on interactivehandheld devices and more particularly to tools that implement a user's3D navigation experience capable of displaying an interactive renditionof 2D and/or 3D audio-visual data accessed by the device locally and/orstreamed via remote systems.

Description of the Prior Art

Gestural interfaces have become increasedly present in the market duringthe last few years. Consumer electronics manufacturers such as Nintendo,Apple, Nokia, LG, and Microsoft have all released products that arecontrolled using interactive gestures. Many of them utilize the motionof the human body or those of handheld controllers to drive users'interaction with videogames, television menu's control and the like.

Most current videogame interfaces on mobile devices like smart phonesand tablets already use touch gestures to allow players to executemovements in space or choose actions or commands that are then reflectedon-screen. Other categories of hardware devices in the videogames marketincorporate gesture driven interfaces such as game consoles likeMicrosoft xBox™ 360 which use specific hardware (kinect) capable ofreading user body motion and/or posture or tracking gesture sequencesthrough a sophisticated implementation of image recognition techniquesand augmented (3D depth) camera acquisition. Newer devices, like theLeap Motion Controller generalize some of the motion-tracking paradigmwhile bringing it out of the videogames domain and into the everydaydesktop applications (apps).

Panoramic Imagery Navigation applications, like Google Street View, haveincorporated both paradigms of motion and gesture—used alternatively toexplore geo-referenced street panoramas.

Speech commands are commonly used in standard applications such as Siri™(for Apple devices), or Loquendo™ (for PC programs), or the MicrosoftInquisit™ speech recognition engine. When the user speaks a command, thespeech recognition system is activated detecting the phonetics andperforming the required action. These speech recognition systems usuallymust be trained to the user's voice.

It would be extremely advantageous to have a system that could take fulladvantage of the gestural and motion capabilities available to a deviceusing onboard sensors in concert with buttons displayed on the devicescreen that provides complete navigational capabilities.

SUMMARY OF THE INVENTION

The present invention creates a cognizant and adaptive method ofinforming a multi-modal navigation interface or a user's intent. Thisprovides the user with the experience of exploring an immersiverepresentation of the processed multimedia (audio-video-data) sourcesavailable that automatically adapts to her/his fruition preference. Suchsources may include any combination of:

-   -   3D Computer Generated Simulations or Games.    -   Speech Recognition Data.    -   Surround Positional Audio sources. And Virtual Acoustic        Environment data.    -   Real World Positional Data (GPS—A-GPS—GLONASS—GNSS—Local/Indoor        GPS—CELL ID—Location Pattern Matching LPM—wireless positioning,        radio-based and the like.)    -   Geo-referenced data and 3D geo-referenced databases.    -   Augmented Reality reference frames, markers, and miscellaneous        data.    -   On-board camera data (Device's own or attached hardware);        applied to: tracking, positioning, image processing.    -   Device's on-board (or attached hardware) positioning and        attitude sensors like: accelerometer, magnetometer, gyroscope        etc.    -   External local or remote positioning data sources.    -   2D-2½ D Panoramic Imaging and Video and relative camera data.    -   3D Computer Reconstruction from Imaging sources captured via:        -   Multiple cameras interpolation and 3D interpretation.        -   Hybrid Multiple cameras and Depth Capture systems 3D            reconstruction.    -   Stereoscopic viewing data.    -   Time coding information relative to multimedia and data sources.

A specific example is given here demonstrating the navigationcapabilities of the present invention; the example was developed forApple iPad™ device. The concepts presented here can easily betransferred to other environments. Such environments may include varioustypes of mobile devices such as smartphones and tablets as well as othersimilar cases of gestural, motion and speech sensing, enabled hardware.

Possible embodiments include the navigation of tri-dimensional virtualworlds like computer graphic simulations or videogames through thecombined use of gesture, motion and speech modes on a mobile device.This provides the user with a navigation experience that is cognizant ofthe device Positioning, Heading and Attitude (orientation relative to aframe of reference: Cartesian, spherical etc.).

These results are obtained by first reconciling and aligning the Userand the Device's frames of reference in tri-dimensional space and thendynamically and adaptively Smoothly Switching and/or combining bothGesture, Motion and Speech modalities. The direct consequence is a userexperience that naturally adapts to the user choice of interaction andmovement.

As an example, using a tablet (like the Apple iPad™ or similar productson the market today) to experience a videogame, the present inventionallows the player to alternatively or collaboratively apply Gesture,Speech and Motion modalities. While sitting on a sofa, for instance, theplayer may prefer to delegate all navigation actions to the GestureInterface (for example a touch screen interaction in case of the AppleiPad™) but a sudden “call to action” in the flow of the game, or anaudible stimulus (3D audio localization performed by the application)can create the need of an abrupt change of point of view and/or positionin the virtual world. This can more efficiently be achieved by lettingthe player's natural movement (as detected by the Motion Interface)intervene in the simulation, and either collaborate with the GestureInterface or, in some cases, take control of the navigation systemaltogether.

DESCRIPTION OF THE FIGURES

Several drawings are presented to illustrate features of the presentinvention:

FIG. 1 shows control in mode M1; gestural exclusive.

FIG. 2 shows control in mode M2; device motion exclusive.

FIG. 3 shows control in mode M3; speech control exclusive.

FIG. 4 shows control in mode M4; static motion, displaced gestural.

FIG. 5 shows control in mode M5; static gestural, displaced motion.

FIG. 6 shows control in mode M6; static speech, displaced motion.

FIG. 7 shows control in mode M7; static speech, displaced gestural.

FIG. 8 shows control in mode M8; static gestural, displaced speech.

FIG. 9 shows control in mode M9; static motion, displaced speech.

FIG. 10 shows a process flow graph.

Drawings and illustrations have been presented to aid in understandingthe present invention. The scope of the present invention is not limitedto what is shown in the figures.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to a system and method for combining andsmoothly switching between gestural, motion and speech control of ahandheld device.

The desired level of interaction described in the present invention isobtained by means of an advanced gesture interface system that performsthe following tasks:

-   -   Calibrates the Device World Coordinates and Frame of Reference.    -   Calculates the relevant multi-dimensional data (space, time,        elements characteristics and the like) derived from the        processed sources.    -   Detects the presence of Gesture actions.    -   Detects the presence of Motion actions.    -   Detects the presence of Speech actions.    -   Detects the Device relative to dynamically changing Heading,        Position and Attitude.    -   Determines whether in each of the Navigation Classes there        should be a “prevalence” of a chosen modality.    -   Interprets the User's Input: Gesture, Motion and Speech to        determine the appropriate tri-dimensional changes and actions        (which can take place as 3D Space and/or Time displacement).    -   Performs all the necessary adjustment and re-alignments to the        Device World Frame of Reference to allow for a dynamic smooth        switching among the different possible modalities.

The elements being described here can be performed on audio-video-datasources obtained via the methods known in the art. Such sources might beavailable offline to be pre-processed and/or can be streamed andinterpreted in real-time by the server and/or the client.

DEFINITIONS

“Data Sources” are sources of 2D or 3D audio-video.“World” is a multi-dimensional representation of audio-video-datasources that can manifest as:

-   -   Computer Generated (CG): 3D Videogames, Virtual Simulations.    -   Real World: 2D-2½ D Panoramic Imaging and/or Video, 3D        reconstruction from various methods.    -   Merged: Real and CG combined World for: Augmented Reality, Mixed        Realities and the like.        “User” is a single or multiple entity, human or computer,        locally or remotely interacting with the Device.        “Virtual User” is a single or multiple representation of the        User in the World.        “Device” is a single or multiple handheld hardware device        capable of one or any combination of: displaying, receiving,        recording and processing multimedia sources as well as receiving        direct or remote input from User or local or remote device        and/or computer system.        “Device Vector” is the vector defined by the device Heading,        Position and Attitude.        “Device Window” is a single or multiple instance of the device        Viewing Frustum as determined by virtual (CG World) or real        (Real World) camera lens data and by the Device detection and        programming of its own (or attached) hardware camera parameters.        “Gesture Interface” This is the category of interactions        performed by the user/s gestures (touch-motion-expression) with        the hardware available on board and/or attached to the device.        Examples include: touch screens, gesture detection or user/s        body motion via additional devices like Leap Motion etc. or face        expression detection via on-board camera or additional devices.        “Motion Interface” This is the category of interactions,        performed by the user/s while moving in free space and holding        the device, detected by the hardware available on-board and/or        attached/connected to the device. Examples include motion        tracking via: accelerometers, gyroscopes, magnetometers, GPS,        camera tracking-image processing and other similar sensors. This        is to determine, for example, the device heading, position and        attitude while the user freely moves it up, down, left, right,        back, forward and/or rotates it around its axes.        “Speech Interface” This is the category of interactions        performed by the user/s via spoken language with the hardware        available on board the device and/or attached/connected to the        device along with speech recognition software.        The three modalities: Gesture, Motion and Speech Interfaces may        act collaboratively permitting a Smooth Switching among their        possible combination and variations.

The relative and absolute localization data about the device isdetermined by the vector (Device Vector) defined by its Heading,Position and Attitude information provided by the device's onboardand/or attached sensors (calculated from the raw sensor inputs).

Gesture, Motion and Speech Interfaces utilize user input as detected andclassified in two principal Navigation Classes: Static and Displaced.

Static

The Static Navigation Class represents all the types of motions of theDevice Vector that do not significantly alter its Position parameters(depending on frame of reference). Examples may include: look around inall directions, tilt up or down (all without lateral displacement).

Displaced

The Displaced Navigation Class represents all the types of motions ofthe Device Vector that significantly alter its Position parameters(depending on frame of reference). Examples may include: moving forward,back, left, right, up, down.

Users can switch from a pure Gestural use of the interface by performingrelevant Motions or Speech (Static or Displaced—as rendered available bythe system). Relevant Motions are motions of the Device Vector possiblydetermined by respective user movements (captured by the Device sensors)that exceed a programmed and user changeable threshold.

In a possible embodiment, the user can explore a given virtual worldwith the combined use of Gestures, Motion and Speech Interfaces of whichwe now give examples of as follows:

Gesture Interface

Gesture Interface (left or right fingers)—Look Around (Static Class)

Up-Down-Left-Right

Gesture Interface (left or right fingers)—Move (Displaced Class)

Walk-Run-Jump-Fly-Duck|Back-Forth-Left-Right

A typical example may be through the use of virtual buttons on a touchscreen or a joystick (virtual or real). The user manipulates thatbuttons with the fingers to change the view toward the left or right, orup or down in the static class. In the dynamic class, the user againmanipulates the fingers cause scene being viewed to displace its viewinglocation. The user, while really at a fixed location, causes the sceneto appear so that the user has the impression that he or she is walking,running flying or the like.

Motion Interface

User moves the device in all directions (Static Class)Pivot around position in 360 degrees

Here the user actually moves the device in its physical space (usuallyby holding the device and changing its orientation. The viewed scenefollows the motions of the user. Hence, if the user was holding thedevice pointing horizontally, and then lifts it upward over his head,the scene can go from showing what is in front horizontally to what isabove. In the static class, the user does not change his coordinates in3D space.

User moves her/his body in all directions (Displaced Class)

Walk-Run-Jump-Fly-Duck|Back, Forth, Left, Right

Rotates around her/himself while standing and/or walking/running or thelike. Here, the user changes his coordinates in physical space bywaking, running or the like.

Speech Interface Spoken Command—“Look Around” (Static Class) “LookUp”-“Down”-“Left”-“Right”—“Look At” Spoken Command—“Move” (DisplacedClass) “Walk”-“Run”-“Jump”-“Fly”-“Duck”|“Back”-“Forth”-“Left”-“Right”

“GoTo” spoken directions like:

“Take me to”:

-   -   “Address”    -   “Position in 3D space”    -   “Relevant locations”

The speech interface can contain a large number of recognizable spokencommands that can be used in a way similar to the gestural commands,except spoken to move the scene in either the static or dynamic class.

Prevalence

To maintain order when the user is allowed to make any combination ofgestural, motion or speech commands (or subsets of these), it isnecessary to give the various interfaces priority values. These will becalled prevalence. Prevalence is used to decide what to do when commandsare received simultaneously on more than one of the interfaces. Thepreferred embodiment of the present invention uses the followingprevalence:

-   -   1) Gesture based actions take prevalence over Motion and Speech.    -   2) Motion based actions can take place if a Static or Displaced        Navigation Class is not being performed by the Gesture        Interface.    -   3) Motion based action take prevalence over Speech.

This is one possible prevalence assignment. Other combinations arepossible. Any choice of prevalence is within the scope of the presentinvention.

This provides the user with modal variations like (shown in FIGS. 1-9):M0—A calibration mode where no gestures, motions or speech commands areaccepted.M1—(Gestural Exclusive—All actions related to Navigation Classes areperformed via Gestural Interface)

-   -   Gestural Interface=Static    -   Gestural Interface=Displaced    -   Motion Interface=NULL    -   Speech Interface=NULL

The user typically uses finger manipulations to change what is displayedas shown in FIG. 1. The user can move the view to any direction, move inand out (zoom), or in the displaced class, cause the scene to simulateuser motion in the 3D image space. In the Gestural exclusive mode, thephysical device is not moved.

M2—(Motion Exclusive—All actions are performed via user physical motionswhile holding the device)

-   -   Gesture Interface=NULL    -   Motion Interface=Static    -   Motion Interface=Displaced    -   Speech Interface=NULL

Here the user totally controls the device by physically moving it asshown in FIG. 2. For example, in the displaced class, if the user holdsthe device in front of him and walks, the displayed scene movesaccordingly as though the viewer were moving the cameral point of viewin the scene.

M3—(Speech Exclusive—All actions are performed via spoken language)

-   -   Gesture Interface=NULL    -   Motion Interface=NULL    -   Speech Interface=Static    -   Speech Interface=Displaced

Here static and displaced class commands are given exclusively by voiceas shown in FIG. 3. Commands are decoded by speech recognitionsoftware/hardware from a library of possible commands.

M4—(Static gestural, static motion—commands may be entered both bygestures and by static motion.

-   -   Gesture Interface=Displaced    -   Motion Interface=Static    -   Speech Interface=NULL

Here gestures can be used to control walking, running and the like withstatic motions used to determine the direction of view as shown in FIG.4.

M5—(Static gestural, displaced motion—commands may be entered by bothgesture and motion).

-   -   Gesture Interface=Static    -   Motion Interface=Displaced    -   Speech Interface=NULL

Here running, walking and the like are controlled by actually moving thedevice, while the direction of view is determined by gestures as shownin FIG. 5.

M6—(Displaced motion, static speech—commands may be given by moving thedevice and speech).

-   -   Gesture Interface=NULL    -   Motion Interface=Displaced    -   Speech Interface=Static

Here, running, walking and the like are controlled by moving the device,while the direction of view is determined by speech command as shown inFIG. 6.

M7—(Static speech, displaced gestural—commands may be given both bygestures and speech).

-   -   Gesture Interface=Displaced    -   Motion Interface=NULL    -   Speech Interface=Static

Here gestures are used to control running, waking and the like, whilespeech commands determine the direction of view as shown in FIG. 7.

M8—(Static gestural, displaced speech—commands may be given by speechand by gestures).

-   -   Gesture Interface=Static    -   Motion Interface=NULL    -   Speech Interface=Displaced

Here, speech commands are used to determine running, walking and thelike while gestures determine the direction of view as shown in FIG. 8.

M9—(Static motion, displace speech—commands may be given by motion andspeech).

-   -   Gesture Interface=NULL    -   Motion Interface=Static    -   Speech Interface=Displaced

Here, speech commands control running, walking and the like, whilemotions control the direction of view as shown in FIG. 9.

As previously stated, when multiple interfaces are used for control,order is maintained through the use of prevalence. Here are someexamples of prevalence:

Mode M4—Static gestural, static motion.

The user uses the Gesture Interface to alter her/his position throughthe World while, at the same time, uses the Motion Interface (pointingthe device towards the desired direction) to determine her/hisorientation in 3D space.

Mode M5—Static gestural, displaced motion.

The user uses the Motion Interface to alter her/his position through theWorld while, at the same time, uses the Gesture Interface (for exampleusing a touch screen interaction) to determine her/his orientation in 3Dspace.

M8—Static gestural, displaced speech.

The user uses the Speech Interface to alter her/his position through theWorld while, at the same time, uses the Gesture Interface (for exampleusing a touch screen interaction) to determine her/his orientation in 3Dspace.

The present invention provides a Smooth and Adaptive Automatic Switchingamong these modalities. The method provides the user with a reasonablyseamless transition between situations where the interaction changes (inone of the two navigation classes [static-displaced]) from Gesture orSpeech to Motion. All the while, the system monitors changes in userinteraction and the device's relative position and attitude and providesa real-time dynamic adaptation to the world's (relative or absolute)coordinates returning (smoothly switching) the Static or Displaced Classcontrol to the Gesture, Speech and Motion's Interfaces respectively.

The following steps are used to achieve the smooth and adaptiveautomatic switching as shown in FIG. 10

Calibration

The purpose of this process is to perform a first alignment of Deviceand World's Coordinates and respective Frames of Reference in thefollowing scenarios:

-   -   Computer Generated (Videogames—3D simulations)    -   Real World (Google StreetView like navigation)    -   Merged (Augmented Reality applications)        According to FIG. 10, the tilt, or device position, is aligned        to user coordinates.        According to FIG. 10, after calibration, static and displaced        states are processed on the left and right of the diagram.        According to prevalence, as previously described, gestures are        first processed, then motion and finally speech commands. If        there is an active gesture, motion and speech are ignored during        that gesture. If there is no active gesture, the system looks        for a motion that exceeds a predetermined threshold. If there is        an active motion, speech is ignored. Finally, if there are no        active gestures and no active motion, any speech commands are        processed. The system loops endlessly to accept new commands.

Sample pseudo code has been supplied to illustrate a possible embodimentof the present invention on a device that allows gestures, containsmotion sensors, and can receive and process speech commands.

Embodiments, however, may be embodied in many different forms and shouldnot be construed as being limited to the embodiment set forth herein.Rather, this preferred embodiment is provided so that this disclosurewill be thorough and complete, and will fully convey the scope to thoseskilled in the art. As used herein, the term “and/or” includes any andall combinations of one or more of the associated listed items. It willbe understood that although the terms first, second, third, etc., may beused herein to describe various elements, components, Classes ormethods, these elements, components, Classes or methods should not belimited by these terms. These terms are only used to distinguish oneelements, components, Classes or methods from another element,component, Class or method.

Several descriptions and illustrations have been provided to aid inunderstanding the present invention. One with skill in the art willunderstand that numerous changes and variations may be made withoutdeparting from the spirit of the invention. Each of these changes andvariations is within the scope of the present invention.

Processing Note:

The present example follows the preferred embodiment and applies thePrevalence method as described above in the Prevalence paragraph. Thisdoes not exclude further embodiments that use a different Prevalenceassignment.

//PROCESS WORLD init World View; init DeviceViewFrustum; (Virtual WorldView is established [locally or remotely] and displayed on device)//PROCESS USER INTERACTION NavigationState = M0 init GestureInterface;init MotionInterface; init SpeechInterface; init StaticNavigationClass;init DisplacedNavigationClass ; init DetectionThreshold; //(Sensorthreshold of intervention) init StaticNavigationState = NULL; initDisplacedNavigationState = NULL; init GestureNavigationState = NULL;init Motion NavigationState = NULL; init Speech NavigationState = NULL;//Sensors on the devices are continuously queried. Each of the possibleinput is associated with a set of actions that correspond to theNavigation Classes as explained above on page 8

Main

get NavigationState detect UserInteraction -- // DETECT USER INTERACTIONThe STATIC and DISPLACED navigation classes are queried simultaneouslyto check if each of the Gesture, Motion and Speech Interfaces (in orderof priority) is using any of the available actions present in the twoclasses (see examples page 8).   for (NavigationClass = STATIC;DISPLACED)    if GestureDetection ≠ NULL and DetectionThreshold ≠ NULL   update GestureNavigationState (STATIC, DISPLACED)    for(NavigationClass = STATIC; DISPLACED) = NULL     if MotionDetection ≠NULL and DetectionThreshold ≠ NULL     update Motion NavigationState(STATIC, DISPLACED)     for (NavigationClass = STATIC; DISPLACED) = NULL     if SpeechDetection ≠ NULL and DetectionThreshold ≠ NULL      updateSpeech NavigationState (STATIC, DISPLACED)    then // UPDATE USERINTERACTION STATE  a. Static Class -> Interface used(Gesture-Motion-Speech).  b. Displaced Class -> Interface used(Gesture-Motion-Speech).  c. Calculation of new device vector. SetNavigationState = M(0-9) if NavigationState ≠ M0 then NewNavigationState= TRUE UpdateDeviceWindow

//Switch to Different Modalities

In the current example a Gesture Prevalence method is described, as aconsequence Smooth Switching between different modalities is explainedconsidering such prevalence.When a change is present and executed in the user fruition of theClasses and Interfaces (for instance going from a “gesture only” to oneof the possible “gesture and motion” modalities), such change isdetected and recorded in the updated user interaction.When an action, previously performed in either the Static or theDisplaced Classes using either the Gesture or Speech Interfacesinstructions, is subsequently dynamically performed using the Motioninterface, a new real-time query of the Device Position and Attitudecontinuously updates the Device Vector and a transition trajectory pathis calculated from the last non-Motion Interface coordinates to thecurrently updated Device Vector.To allow the method to give the user the sensation that it is “smoothlyfollowing” the change in the use of the interface, the process, whenrequired, can perform a programmed animation of the transitiontrajectory from the point of switch (Gesture or Speech to Motionexecuted action) to its immediate destination in real-time (consideringthe device and eventual connection performance) finally adapting thedevice vector and window to the new request from the user.

get NavigationState if NewNavigationState = TRUE get STATIC andDISPLACED Interfaces States  if in STATIC or DISPLACED Classes, Speechand/or Motion Interface based actions = NULL   if GestureInterfaceactions ≠ NULL   compute GestureInterface instructions   compute(animate) alignment to new position of DeviceViewFrustum  UpdateDeviceWindow  if in STATIC or DISPLACED Classes, Gesture and/orMotion Interface based actions = NULL   if Speech Interface actions ≠NULL   compute SpeechInterface instructions   compute (animate)alignment to new position of DeviceViewFrustum   UpdateDeviceWindow  ifin STATIC or DISPLACED Classes, Gesture and/or Speech Interface basedactions =  NULL   if MotionInterface actions ≠ NULL   compute MotionInterface Device Attitude Trajectory   compute (animate) TrajectoryTransition alignment to new position of DeviceViewFrustum  UpdateDeviceWindow  UpdateUserInteractionState

Several descriptions and illustrations have been presented to aid inunderstanding features of the present invention. One with skill in theart will realize that numerous changes and variations may be madewithout departing from the spirit of the invention. Each of thesechanges and variations is within the scope of the present invention.

1. A system for navigation on a handheld device comprising: a gesturemodule that can receive gesture commands from a physical interface onthe device or a remote rendering computer server by executing storedfirst computer instructions on a processor contained in the device amotion module that produces motion commands by detecting motion of thedevice in 3-dimensional space by executing second stored computerinstructions on said processor that read and process data from motionsensors contained in the device; a speech module that can decode voicecommands spoken through a microphone contained in the device byexecuting third stored computer instructions on said processor; agraphics driver adapted to display an image on a display screen attachedto the device; data from each of said gesture module, said motion moduleand said speech module combined in a prevalence module that executesfourth stored computer instructions on said processor to prioritize agesture command over a motion command, and prioritize a motion commandover a speech command, said prevalence module adapted to issue an actioncommand based on either a gesture, a motion of the device, or a speechcommand to said graphics driver to cause said graphics driver to modifythe image on the display according to said gesture, motion or speechcommand; and, wherein said prevalence module transitions smoothlybetween gesture, motion and speech commands; and wherein the gesture,motion and speech commands include the following modalities: M1—staticand displaced gesture commands; M2—static and displaced motion commands;and wherein the gesture, motion and speech commands include at least oneof: M3—static and displaced speech commands; M4—displaced gesturecommands and static motion commands; M5—static gesture commands anddisplaced motion commands; M6—displaced motion commands and staticspeech commands; M7—displaced gesture commands and static speechcommands; M8—static gesture commands and displaced speech commands; orM9—static motion commands and displaced speech commands.
 2. The systemfor navigation of claim 1 wherein said device is a telephone or tabletcomputer.
 3. The system for navigation of claim 1 wherein said displayis a touch screen and gestures are entered on virtual controls on saidtouch screen.
 4. The system for navigation of claim 1 wherein saidsensors include at least a motion sensing device.
 5. The system fornavigation of claim 1 wherein said first, second, third and fourth setof computer instructions are stored in a memory contained is said deviceor in a remote computer server.
 6. The system for navigation of claim 1wherein said motion module generates a motion command when a devicemotion parameter exceeds a predetermined value.
 7. The system fornavigation of claim 1 wherein said gesture commands, said motioncommands and said speech commands each belong either to a static classor a displaced class, wherein commands belonging to the static classchange view direction of said image with no change in displacement, andcommands belonging to the displaced class change displacement of saidimage with no change in view direction.
 8. The system for navigation ofclaim 1 configured to provide a smooth and adaptive automatic switchingamong modalities M1, M2, M3, M4, M5, M6, M7, M8 and M9.
 9. The systemfor navigation of claim 7 configured to provide a user with a seamlesstransition between situations where interaction changes from gesture orspeech commands to motion commands in either the static or the displacedclass.
 10. A method for image navigation on a handheld computer devicehaving a touch screen, camera or joystick, a set of motion orgeo-location sensors and a microphone driving a speech recognitionsub-system, said device also having a 3D graphic engine on board orconnected to a remote rendering device and a current device windowrelating to a displayed image, comprising: receiving a gesture commandfrom said touch screen, camera or joystick, and/or; receiving a motioncommand from said motion sensors, and/or; receiving a speech commandfrom said speech recognition sub-system; computing static class datafrom a static gesture command, if present; else computing static datafrom a static motion command, if present; else computing static datafrom a static speech command if present; computing displaced class datafrom a displaced gesture command, if present; else computing displacedclass data from a displaced motion command, if present; else computingdisplaced data from a displaced speech command if present; determininglocally or remotely an update for said current device window based onsaid static class data and said displaced class data; commanding the 3Dgraphic engine locally or remotely to perform said update to the currentdevice window for said displayed image; and wherein the gesture, motionand speech commands include at least one of the following modalities:static and displaced speech commands; displaced gesture commands andstatic motion commands; static gesture commands and displaced motioncommands; displaced motion commands and static speech commands;displaced gesture commands and static speech commands; static gesturecommands and displaced speech commands; or static motion commands anddisplaced speech commands.
 11. The method of claim 10 further comprisingproviding a user with a seamless transition between situations whereinteraction changes from gesture or speech commands to motion commandsin either a static or a displaced class.
 12. The method of claim 10wherein the handheld computer device is a cellular telephone or tabletcomputer.